"""
Sources:
https://www.kaggle.com/datasets/therealoise/top-1000-highest-grossing-movies-of-all-time
https://stackoverflow.com/questions/31521526/convert-currency-to-float-and-parentheses-indicate-negative-amounts
https://stackoverflow.com/questions/39173813/pandas-convert-dtype-object-to-int
https://stackoverflow.com/questions/29077188/absolute-value-for-column-in-python
Question(s): Do movies that receive higher movie ratings and metascores generate higher worldwide lifetime gross revenue?
Second question for fun: Does the average IMDb user enjoy the same movies as a reputed critic or publication?
"""
import pandas as pd
import re
df = pd.read_csv("movie_data.csv")
#Drops null values that were entered as "******" instead of NaN or null from Kaggle.
for x in df.index:
if df.loc[x, "Metascore"] == "******":
df.drop(x, inplace = True)
#Converts data type of columns "Worldwide LT Gross" and "Metascore" from object to float and int.
df["Worldwide LT Gross"] = df["Worldwide LT Gross"].replace("[\$,]", "", regex = True).astype(float)
df["Metascore"] = pd.to_numeric(df["Metascore"])
#Converts the "Movie Rating" scale from 0-10 to 0-100 to compare to "Metascore"
df["Movie Rating"] = df["Movie Rating"].multiply(10)
#Comparing "Movie Rating" to "Metascore" and converts "Rating Difference" to a positive number to compare difference.
df["Rating Difference"] = df["Movie Rating"] - df["Metascore"]
df["Rating Difference"] = df["Rating Difference"].abs()
df.corr()
| Movie Rating | Duration | Worldwide LT Gross | Metascore | Rating Difference | |
|---|---|---|---|---|---|
| Movie Rating | 1.000000 | 0.380574 | 0.253547 | 0.773675 | -0.239902 |
| Duration | 0.380574 | 1.000000 | 0.288802 | 0.249778 | -0.070324 |
| Worldwide LT Gross | 0.253547 | 0.288802 | 1.000000 | 0.202954 | -0.126050 |
| Metascore | 0.773675 | 0.249778 | 0.202954 | 1.000000 | -0.645951 |
| Rating Difference | -0.239902 | -0.070324 | -0.126050 | -0.645951 | 1.000000 |
df
| Movie Title | Year of Realease | Genre | Movie Rating | Duration | Gross | Worldwide LT Gross | Metascore | Votes | Logline | Rating Difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Avatar | 2009 | Action,Adventure,Fantasy | 78.0 | 162 | $760.51M | 2.847397e+09 | 83 | 1,236,962 | A paraplegic Marine dispatched to the moon Pan... | 5.0 |
| 1 | Avengers: Endgame | 2019 | Action,Adventure,Drama | 84.0 | 181 | $858.37M | 2.797501e+09 | 78 | 1,108,641 | After the devastating events of Avengers: Infi... | 6.0 |
| 2 | Titanic | 1997 | Drama,Romance | 79.0 | 194 | $659.33M | 2.201647e+09 | 75 | 1,162,142 | A seventeen-year-old aristocrat falls in love ... | 4.0 |
| 3 | Star Wars: Episode VII - The Force Awakens | 2015 | Action,Adventure,Sci-Fi | 78.0 | 138 | $936.66M | 2.069522e+09 | 80 | 925,551 | As a new threat to the galaxy rises, Rey, a de... | 2.0 |
| 4 | Avengers: Infinity War | 2018 | Action,Adventure,Sci-Fi | 84.0 | 149 | $678.82M | 2.048360e+09 | 68 | 1,062,517 | The Avengers and their allies must be willing ... | 16.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 995 | The A-Team | 2010 | Action,Adventure,Thriller | 67.0 | 117 | $77.22M | 1.772388e+08 | 47 | 259,316 | A group of Iraq War veterans look to clear the... | 20.0 |
| 996 | Tootsie | 1982 | Comedy,Drama,Romance | 74.0 | 116 | $177.20M | 1.772003e+08 | 88 | 107,311 | Michael Dorsey, an unsuccessful actor, disguis... | 14.0 |
| 997 | In the Line of Fire | 1993 | Action,Crime,Drama | 72.0 | 128 | $102.31M | 1.769972e+08 | 74 | 104,598 | Secret Service agent Frank Horrigan couldn't s... | 2.0 |
| 998 | Analyze This | 1999 | Comedy,Crime | 67.0 | 103 | $106.89M | 1.768857e+08 | 61 | 154,726 | A comedy about a psychiatrist whose number-one... | 6.0 |
| 999 | The Hitman's Bodyguard | 2017 | Action,Comedy,Crime | 69.0 | 118 | $75.47M | 1.766002e+08 | 47 | 230,821 | One of the world's top bodyguards gets a new c... | 22.0 |
964 rows × 11 columns
import plotly.express as px
#Scatter plot visualization
fig1 = px.scatter(x = df["Movie Rating"], y = df["Worldwide LT Gross"])
fig1.show()
fig2 = px.scatter(x = df["Metascore"], y = df["Worldwide LT Gross"])
fig2.show()
fig3 = px.histogram(df, x = ["Metascore","Movie Rating"], barmode = "overlay")
fig3.show()
IMDb, the world's largest database for films, created a list containing the top 1,000 highest grossing movies of all time along with several other criteria such as the movie rating, domestic total gross, worldwide gross, and metascore. To answer the question whether movies generate higher worldwide gross revenue if they receive higher ratings and scores, one must define worldwide lifetime gross, movie rating, and Metascore. Worldwide lifetime gross is the total amount of revenue generated from international and domestic totals before accounting for expenses. Movie ratings are defined as the average weighted score that the film received from registered IMDb users while Metascore is defined as the weighted score from various reputed critics and publications. Movie ratings are calculated betwen the ranges of 1-10, allowing for decimals while Metascore is rated on a scale from 0-100, only allowing scores with whole numbers. To fairly compare movie ratings and Metascore, movie ratings were scaled up to a rating between 1-100. Some data was redacted due to not receiving Metascores due to the movie being released before Metascores were created or originating from a foreign country such as China. IMDb does not account for inflation when calculating the worldwide gross revenue, which skews the data and creates a more favorable scenario for movies that were released more recently. According to the correlation table and scatter plot, there is almost no correlation between movie rating and worldwide lifetime gross and there is an even weaker correlation between Metascore and worldwide lifetime gross. There does however, seem to be a strong positive correlation between movie ratings and Metascore, as shown on the histogram. The average IMDb user and Metascore critic seem to generally agree on movie scores. The histogram shows that Metascore seems to be harsher when scoring movies, while IMDb users tend to enjoy the average movie with no movie receiving a score less than 2.5 or 25 when converted to the Metascore scale. On the opposite end of the spectrum, there are movies that Metascore rated as "Must-See" and even a movie that was rated as perfect ("The Godfather") while there are no movies, after adjusting for the average score, that were deemed perfect by IMDb users.